10 research outputs found

    MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY

    Get PDF
    The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world

    Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases

    Get PDF
    Family 7 glycoside hydrolases (GH7) are among the principal enzymes for cellulose degradation in nature and industrially. These enzymes are often bimodular, including a catalytic domain and carbohydrate-binding module (CBM) attached via a flexible linker, and exhibit an active site that binds cello-oligomers of up to ten glucosyl moieties. GH7 cellulases consist of two major subtypes: cellobiohydrolases (CBH) and endoglucanases (EG). Despite the critical importance of GH7 enzymes, there remain gaps in our understanding of how GH7 sequence and structure relate to function. Here, we employed machine learning to gain data-driven insights into relation-ships between sequence, structure, and function across the GH7 family. Machine-learning models, trained only on the number of residues in the active-site loops as features, were able to discriminate GH7 CBHs and EGs with up to 99% ac-curacy, demonstrating that the lengths of loops A4, B2, B3, and B4 strongly correlate with functional subtype across the GH7 family. Classification rules were derived such that specific residues at 42 different sequence positions each predicted the functional subtype with accuracies surpassing 87%. A random forest model trained on residues at 19 positions in the catalytic domain predicted the presence of a CBM with 89.5% accuracy. Our machine learning results recapitulate, as top-performing features, a substantial number of the sequence positions determined by previous experimental studies to play vital roles in GH7 activity. We surmise that the yet-to-be-explored sequence positions among the top-performing features also contribute to GH7 functional variation and may be exploited to understand and manipulate function

    Enabling microbial syringol conversion through structure-guided protein engineering

    Get PDF
    Microbial conversion of aromatic compounds is an emerging and promising strategy for valorization of the plant biopolymer lignin. A critical and often rate-limiting reaction in aromatic catabolism is O-aryl-demethylation of the abundant aromatic methoxy groups in lignin to form diols, which enables subsequent oxidative aromatic ring-opening. Recently, a cytochrome P450 system, GcoAB, was discovered to demethylate guaiacol (2-methoxyphenol), which can be produced from coniferyl alcohol-derived lignin, to form catechol. However, native GcoAB has minimal ability to demethylate syringol (2,6-dimethoxyphenol), the analogous compound that can be produced from sinapyl alcohol-derived lignin. Despite the abundance of sinapyl alcohol-based lignin in plants, no pathway for syringol catabolism has been reported to date. Here we used structure-guided protein engineering to enable microbial syringol utilization with GcoAB. Specifically, a phenylalanine residue (GcoA-F169) interferes with the binding of syringol in the active site, and on mutation to smaller amino acids, efficient syringol O-demethylation is achieved. Crystallography indicates that syringol adopts a productive binding pose in the variant, which molecular dynamics simulations trace to the elimination of steric clash between the highly flexible side chain of GcoA-F169 and the additional methoxy group of syringol. Finally, we demonstrate in vivo syringol turnover in Pseudomonas putida KT2440 with the GcoA-F169A variant. Taken together, our findings highlight the significant potential and plasticity of cytochrome P450 aromatic O-demethylases in the biological conversion of lignin-derived aromatic compounds

    ¿SE ACABARÁ LA CONTAMINACIÓN DEL PLÁSTICO? UNA NUEVA ESPERANZA CON UN COCTAIL DE ENZIMAS

    No full text
    Los científicos que rediseñaron la enzima PETasa que se alimenta de plástico, han creado ahora un 'cóctel' de enzimas que puede digerir el plástico hasta seis veces más rápido. Efectivamente, los científicos se inspiraron en Pacman para crear un ‘cóctel’ para comer plástico, que podría ayudar a erradicar los desechos plásticos. Está compuesto por dos enzimas, llamadas PETase y MHETase, producidas por un tipo de bacteria que se alimenta de botellas deplástico, llamada Ideonella sakaiensis. A diferencia de la degradación natural, que puede tardar cientos de años, la superenzima es capaz de convertir el plástico de nuevo a sus ‘bloques de construcción’ originales en unos pocos días

    Enzyme kinetics by GH7 cellobiohydrolases on chromogenic substrates is dictated by non-productive binding : insights from crystal structures and MD simulation

    No full text
    Cellobiohydrolases (CBHs) in the glycoside hydrolase family 7 (GH7) (EC3.2.1.176) are the major cellulose degrading enzymes both in industrial settings and in the context of carbon cycling in nature. Small carbohydrate conjugates such as p-nitrophenyl-beta-d-cellobioside (pNPC), p-nitrophenyl-beta-d-lactoside (pNPL) and methylumbelliferyl-beta-d-cellobioside have commonly been used in colorimetric and fluorometric assays for analysing activity of these enzymes. Despite the similar nature of these compounds the kinetics of their enzymatic hydrolysis vary greatly between the different compounds as well as among different enzymes within the GH7 family. Through enzyme kinetics, crystallographic structure determination, molecular dynamics simulations, and fluorometric binding studies using the closely related compound o-nitrophenyl-beta-d-cellobioside (oNPC), in this work we examine the different hydrolysis characteristics of these compounds on two model enzymes of this class, TrCel7A from Trichoderma reesei and PcCel7D from Phanerochaete chrysosporium. Protein crystal structures of the E212Q mutant of TrCel7A with pNPC and pNPL, and the wildtype TrCel7A with oNPC, reveal that non-productive binding at the product site is the dominating binding mode for these compounds. Enzyme kinetics results suggest the strength of non-productive binding is a key determinant for the activity characteristics on these substrates, with PcCel7D consistently showing higher turnover rates (k(cat)) than TrCel7A, but higher Michaelis-Menten (K-M) constants as well. Furthermore, oNPC turned out to be useful as an active-site probe for fluorometric determination of the dissociation constant for cellobiose on TrCel7A but could not be utilized for the same purpose on PcCel7D, likely due to strong binding to an unknown site outside the active site

    Enzyme kinetics by GH7 cellobiohydrolases on chromogenic substrates is dictated by non-productive binding: insights from crystal structures and MD simulation

    Get PDF
    Cellobiohydrolases (CBHs) in the glycoside hydrolase family 7 (GH7) (EC3.2.1.176) are the major cellulose degrading enzymes both in industrial settings and in the context of carbon cycling in nature. Small carbohydrate conjugates such as p-nitrophenyl-beta-d-cellobioside (pNPC), p-nitrophenyl-beta-d-lactoside (pNPL) and methylumbelliferyl-beta-d-cellobioside have commonly been used in colorimetric and fluorometric assays for analysing activity of these enzymes. Despite the similar nature of these compounds the kinetics of their enzymatic hydrolysis vary greatly between the different compounds as well as among different enzymes within the GH7 family. Through enzyme kinetics, crystallographic structure determination, molecular dynamics simulations, and fluorometric binding studies using the closely related compound o-nitrophenyl-beta-d-cellobioside (oNPC), in this work we examine the different hydrolysis characteristics of these compounds on two model enzymes of this class, TrCel7A from Trichoderma reesei and PcCel7D from Phanerochaete chrysosporium. Protein crystal structures of the E212Q mutant of TrCel7A with pNPC and pNPL, and the wildtype TrCel7A with oNPC, reveal that non-productive binding at the product site is the dominating binding mode for these compounds. Enzyme kinetics results suggest the strength of non-productive binding is a key determinant for the activity characteristics on these substrates, with PcCel7D consistently showing higher turnover rates (k(cat)) than TrCel7A, but higher Michaelis-Menten (K-M) constants as well. Furthermore, oNPC turned out to be useful as an active-site probe for fluorometric determination of the dissociation constant for cellobiose on TrCel7A but could not be utilized for the same purpose on PcCel7D, likely due to strong binding to an unknown site outside the active site
    corecore